Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

修改COPY-FROM No.13 distributed #6004

Merged
merged 3 commits into from
Jul 21, 2023
Merged

Conversation

jjyaoao
Copy link
Contributor

@jjyaoao jjyaoao commented Jul 7, 2023

@paddle-bot
Copy link

paddle-bot bot commented Jul 7, 2023

感谢你贡献飞桨文档,文档预览构建中,Docs-New 跑完后即可预览,预览链接:http://preview-pr-6004.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html
预览工具的更多说明,请参考:飞桨文档预览工具

Copy link
Collaborator

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

先改这些吧..感觉有挺多坑

docs/api/paddle/distributed/QueueDataset_cn.rst Outdated Show resolved Hide resolved
strategy.recompute = True
strategy.recompute_configs = {"checkpoints": ["x"]}
strategy.save_to_prototxt("dist_strategy.prototxt")
COPY-FROM: paddle.distributed.fleet.DistributedStrategy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
COPY-FROM: paddle.distributed.fleet.DistributedStrategy
COPY-FROM: paddle.distributed.fleet.DistributedStrategy.save_to_prototxt


import paddle.distributed.fleet as fleet
fleet.init()
COPY-FROM: paddle.distributed.fleet.Fleet:code-example1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 这里是init方法下的代码示例,改成
Suggested change
COPY-FROM: paddle.distributed.fleet.Fleet:code-example1
COPY-FROM: paddle.distributed.fleet.Fleet.init:code-example1
  • 另外,Fleet类的代码(paddle.distributed.fleet.Fleet:code-example1paddle.distributed.fleet.Fleet:code-example2) 应该在方法之上进行引用
    image


import paddle.distributed.fleet as fleet
fleet.init(is_collective=True)
COPY-FROM: paddle.distributed.fleet.Fleet:code-example2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
COPY-FROM: paddle.distributed.fleet.Fleet:code-example2
COPY-FROM: paddle.distributed.fleet.Fleet.init:code-example2

import paddle.distributed.fleet as fleet
role = fleet.PaddleCloudRoleMaker()
fleet.init(role)
COPY-FROM: paddle.distributed.fleet.Fleet:code-example3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
COPY-FROM: paddle.distributed.fleet.Fleet:code-example3
COPY-FROM: paddle.distributed.fleet.Fleet.init:code-example3


adam.step()
adam.clear_grad()
COPY-FROM: paddle.distributed.fleet.Fleet.clear_grad


minimize(loss, startup_program=None, parameter_list=None, no_grad_set=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimize好像没有写内容,要不就把英文那部分内容翻译成中文搬过来吧,同时代码也copy from过来
image

# [8, 12]
if __name__ == "__main__":
train()
COPY-FROM: paddle.distributed.fleet.UtilBase
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
COPY-FROM: paddle.distributed.fleet.UtilBase
COPY-FROM: paddle.distributed.fleet.UtilBase.all_reduce

Comment on lines 240 to 252
**代码示例**
.. code-block:: text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**代码示例**
.. code-block:: text
**代码示例**
.. code-block:: text

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有的.. code-block:: text 上下都空一行吧,不然好像官网预览不出来,如:
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SigureMo DistributedStrategy所有代码示例都没有成功过copyfrom,001师傅你看一下

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

奇怪,日志里明明显示已经找到了

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SigureMo Fleet有部分代码示例copy from不了,可能和tensor_cn.rst情况差不多,要不这个skip?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要不都 skip 吧,分布式太折磨了

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要不都 skip 吧,分布式太折磨了

+1,这样这个任务可以早点结了

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

要不都 skip 吧,分布式太折磨了

其实可以把 paddle.distributed.fleet 下的都skip

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjyaoao 讨论了下

  • DistributedStrategy_cn.rstFleet_cn.rst 就恢复原状(不改成copy from),到时候直接skip @SigureMo
  • PaddleCloudRoleMakerUserDefinedRoleMaker 因为缺少英文文档,jj师傅已经写一半了,就连同copyfrom顺手改了吧
  • HDFSClient_cn.st 统一改成text,和英文对齐

@jjyaoao jjyaoao force-pushed the jjy4 branch 2 times, most recently from 4ce6d38 to 6ca612c Compare July 19, 2023 09:21
Copy link
Collaborator

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PaddleCloudRoleMaker 和 UserDefinedRoleMaker 里的代码直接copy from 吧,因为我看paddle端都加上了。其他没有太大问题~

Comment on lines 15 to 23
.. code-block:: python
.. code-block:: text

import os
import paddle.distributed.fleet as fleet

os.environ["PADDLE_PSERVER_NUMS"] = "2"
os.environ["PADDLE_TRAINERS_NUM"] = "2"

os.environ["POD_IP"] = "127.0.0.1"
os.environ["PADDLE_PORT"] = "36001"
os.environ["TRAINING_ROLE"] = "PSERVER"
os.environ["PADDLE_PSERVERS_IP_PORT_LIST"] = \
"127.0.0.1:36001,127.0.0.2:36001"

os.environ["PADDLE_TRAINER_ID"] = "0"

fleet.PaddleCloudRoleMaker(is_collective=False)
from paddle.distributed.fleet.base.role_maker import Role
fleet.UserDefinedRoleMaker(
current_id=0,
role=Role.SERVER,
worker_num=2,
server_endpoints=["127.0.0.1:36011", "127.0.0.1:36012"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块儿直接copy from PaddleCloudRoleMaker吧,我看 PaddlePaddle/Paddle#55236 英文源码那儿都加了

@@ -38,15 +37,13 @@ string

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以上代码都直接copy from UserDefinedRoleMaker吧,因为英文源码都加上了

Copy link
Collaborator

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最后一个修改,queueDataset就不要copy from了,直接改成text..T.T


os.remove("./test_queue_dataset_run_a.txt")
os.remove("./test_queue_dataset_run_b.txt")
COPY-FROM: paddle.distributed.QueueDataset.init
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个init方法还是别用copy from了...英文代码预览都不行,还是把block改成text

Copy link
Collaborator

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants